Overview

Dataset statistics

Number of variables10
Number of observations1000
Missing cells17
Missing cells (%)0.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory78.2 KiB
Average record size in memory80.1 B

Variable types

Categorical4
Numeric6

Alerts

feat.e is highly overall correlated with feat.iHigh correlation
feat.f is highly overall correlated with responseHigh correlation
feat.i is highly overall correlated with feat.eHigh correlation
response is highly overall correlated with feat.fHigh correlation
feat.a has unique valuesUnique
feat.e has unique valuesUnique
feat.f has unique valuesUnique
feat.h has unique valuesUnique
feat.i has unique valuesUnique

Reproduction

Analysis started2022-11-23 20:34:29.326080
Analysis finished2022-11-23 20:34:44.529022
Duration15.2 seconds
Software versionpandas-profiling vv3.5.0
Download configurationconfig.json

Variables

response
Categorical

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
1
553 
0
447 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 553
55.3%
0 447
44.7%

Length

2022-11-23T15:34:44.653111image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-11-23T15:34:45.050982image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
1 553
55.3%
0 447
44.7%

Most occurring characters

ValueCountFrequency (%)
1 553
55.3%
0 447
44.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 553
55.3%
0 447
44.7%

Most occurring scripts

ValueCountFrequency (%)
Common 1000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 553
55.3%
0 447
44.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 553
55.3%
0 447
44.7%

feat.a
Real number (ℝ)

Distinct1000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.0483836
Minimum-7.429324
Maximum10.72312
Zeros0
Zeros (%)0.0%
Negative353
Negative (%)35.3%
Memory size7.9 KiB
2022-11-23T15:34:45.251770image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum-7.429324
5-th percentile-3.8677529
Q1-0.88497273
median1.0276289
Q32.9938056
95-th percentile6.0284016
Maximum10.72312
Range18.152444
Interquartile range (IQR)3.8787783

Descriptive statistics

Standard deviation2.9750849
Coefficient of variation (CV)2.8377828
Kurtosis-0.068601967
Mean1.0483836
Median Absolute Deviation (MAD)1.950913
Skewness0.065392043
Sum1048.3836
Variance8.8511303
MonotonicityNot monotonic
2022-11-23T15:34:45.514566image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-0.6814269397 1
 
0.1%
0.1177140185 1
 
0.1%
3.307156885 1
 
0.1%
1.362157988 1
 
0.1%
3.590945302 1
 
0.1%
5.141543585 1
 
0.1%
6.898744046 1
 
0.1%
0.9148148358 1
 
0.1%
-5.747153271 1
 
0.1%
1.094578014 1
 
0.1%
Other values (990) 990
99.0%
ValueCountFrequency (%)
-7.429324037 1
0.1%
-6.982768395 1
0.1%
-6.929446856 1
0.1%
-6.805099011 1
0.1%
-6.523753407 1
0.1%
-6.397694581 1
0.1%
-5.927506627 1
0.1%
-5.747153271 1
0.1%
-5.674963089 1
0.1%
-5.631899332 1
0.1%
ValueCountFrequency (%)
10.7231198 1
0.1%
9.07514201 1
0.1%
9.054576998 1
0.1%
8.726349291 1
0.1%
8.714374438 1
0.1%
8.65907834 1
0.1%
8.463993632 1
0.1%
8.374181476 1
0.1%
8.290679957 1
0.1%
8.250320061 1
0.1%

feat.b
Real number (ℝ)

Distinct992
Distinct (%)100.0%
Missing8
Missing (%)0.8%
Infinite0
Infinite (%)0.0%
Mean-3.9415151
Minimum-8.5717913
Maximum1.0855562
Zeros0
Zeros (%)0.0%
Negative987
Negative (%)98.7%
Memory size7.9 KiB
2022-11-23T15:34:45.805057image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum-8.5717913
5-th percentile-6.503702
Q1-4.9824809
median-3.9177214
Q3-2.8833259
95-th percentile-1.5984046
Maximum1.0855562
Range9.6573476
Interquartile range (IQR)2.099155

Descriptive statistics

Standard deviation1.512669
Coefficient of variation (CV)-0.38377856
Kurtosis-0.083784143
Mean-3.9415151
Median Absolute Deviation (MAD)1.0592358
Skewness-0.022693233
Sum-3909.983
Variance2.2881674
MonotonicityNot monotonic
2022-11-23T15:34:46.130444image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-5.493698087 1
 
0.1%
-2.481194209 1
 
0.1%
-2.667889449 1
 
0.1%
-6.909910232 1
 
0.1%
-2.465197367 1
 
0.1%
-3.99181341 1
 
0.1%
-3.145331546 1
 
0.1%
-6.479883345 1
 
0.1%
-4.99998157 1
 
0.1%
-4.672351284 1
 
0.1%
Other values (982) 982
98.2%
(Missing) 8
 
0.8%
ValueCountFrequency (%)
-8.571791335 1
0.1%
-8.042994054 1
0.1%
-7.943988162 1
0.1%
-7.906057256 1
0.1%
-7.824014162 1
0.1%
-7.693862525 1
0.1%
-7.566110388 1
0.1%
-7.503920998 1
0.1%
-7.470603661 1
0.1%
-7.376567763 1
0.1%
ValueCountFrequency (%)
1.085556232 1
0.1%
0.935776165 1
0.1%
0.7760667111 1
0.1%
0.224126414 1
0.1%
0.1960867204 1
0.1%
-0.1340978557 1
0.1%
-0.281881298 1
0.1%
-0.3280029895 1
0.1%
-0.3632662883 1
0.1%
-0.3756889403 1
0.1%

feat.c
Categorical

Distinct4
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
b
278 
d
255 
a
240 
c
227 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowb
2nd rowd
3rd rowb
4th rowa
5th rowc

Common Values

ValueCountFrequency (%)
b 278
27.8%
d 255
25.5%
a 240
24.0%
c 227
22.7%

Length

2022-11-23T15:34:46.360068image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-11-23T15:34:46.595405image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
b 278
27.8%
d 255
25.5%
a 240
24.0%
c 227
22.7%

Most occurring characters

ValueCountFrequency (%)
b 278
27.8%
d 255
25.5%
a 240
24.0%
c 227
22.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1000
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
b 278
27.8%
d 255
25.5%
a 240
24.0%
c 227
22.7%

Most occurring scripts

ValueCountFrequency (%)
Latin 1000
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
b 278
27.8%
d 255
25.5%
a 240
24.0%
c 227
22.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
b 278
27.8%
d 255
25.5%
a 240
24.0%
c 227
22.7%

feat.d
Categorical

Distinct2
Distinct (%)0.2%
Missing9
Missing (%)0.9%
Memory size7.9 KiB
1.0
511 
0.0
480 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2973
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0 511
51.1%
0.0 480
48.0%
(Missing) 9
 
0.9%

Length

2022-11-23T15:34:46.797979image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-11-23T15:34:47.005665image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
1.0 511
51.6%
0.0 480
48.4%

Most occurring characters

ValueCountFrequency (%)
0 1471
49.5%
. 991
33.3%
1 511
 
17.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1982
66.7%
Other Punctuation 991
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1471
74.2%
1 511
 
25.8%
Other Punctuation
ValueCountFrequency (%)
. 991
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2973
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1471
49.5%
. 991
33.3%
1 511
 
17.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2973
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1471
49.5%
. 991
33.3%
1 511
 
17.2%

feat.e
Real number (ℝ)

HIGH CORRELATION
UNIQUE

Distinct1000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.51832075
Minimum-6.7581764
Maximum5.2897088
Zeros0
Zeros (%)0.0%
Negative596
Negative (%)59.6%
Memory size7.9 KiB
2022-11-23T15:34:47.217371image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum-6.7581764
5-th percentile-3.8409138
Q1-1.7792963
median-0.51638151
Q30.80107757
95-th percentile2.7744859
Maximum5.2897088
Range12.047885
Interquartile range (IQR)2.5803738

Descriptive statistics

Standard deviation1.9847034
Coefficient of variation (CV)-3.8291026
Kurtosis-0.034369636
Mean-0.51832075
Median Absolute Deviation (MAD)1.2858637
Skewness-0.071509945
Sum-518.32075
Variance3.9390475
MonotonicityNot monotonic
2022-11-23T15:34:47.477136image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-0.8006149557 1
 
0.1%
-0.4105429274 1
 
0.1%
-2.149190977 1
 
0.1%
0.6691611674 1
 
0.1%
-2.496597332 1
 
0.1%
-3.468563012 1
 
0.1%
0.01555496617 1
 
0.1%
0.3305800081 1
 
0.1%
1.550839146 1
 
0.1%
0.9521521358 1
 
0.1%
Other values (990) 990
99.0%
ValueCountFrequency (%)
-6.758176383 1
0.1%
-6.399743569 1
0.1%
-6.335952428 1
0.1%
-6.178752334 1
0.1%
-5.856328822 1
0.1%
-5.819338759 1
0.1%
-5.719050018 1
0.1%
-5.473047809 1
0.1%
-5.271585502 1
0.1%
-5.166574755 1
0.1%
ValueCountFrequency (%)
5.289708782 1
0.1%
4.807481457 1
0.1%
4.698983411 1
0.1%
4.696980464 1
0.1%
4.628818601 1
0.1%
4.580737248 1
0.1%
4.466210225 1
0.1%
4.245676957 1
0.1%
3.971205537 1
0.1%
3.96399422 1
0.1%

feat.f
Real number (ℝ)

HIGH CORRELATION
UNIQUE

Distinct1000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-6.2573455
Minimum-31.099076
Maximum21.567936
Zeros0
Zeros (%)0.0%
Negative789
Negative (%)78.9%
Memory size7.9 KiB
2022-11-23T15:34:47.737743image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum-31.099076
5-th percentile-19.487967
Q1-11.654848
median-6.2624289
Q3-0.91253298
95-th percentile6.3077715
Maximum21.567936
Range52.667012
Interquartile range (IQR)10.742315

Descriptive statistics

Standard deviation8.0055295
Coefficient of variation (CV)-1.2793811
Kurtosis0.17362413
Mean-6.2573455
Median Absolute Deviation (MAD)5.3710946
Skewness0.027865522
Sum-6257.3455
Variance64.088503
MonotonicityNot monotonic
2022-11-23T15:34:48.016739image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-4.427601788 1
 
0.1%
5.733868313 1
 
0.1%
-16.23534137 1
 
0.1%
-4.527911115 1
 
0.1%
-11.99220613 1
 
0.1%
-10.86518825 1
 
0.1%
-2.641091002 1
 
0.1%
0.7347983651 1
 
0.1%
-2.958744506 1
 
0.1%
-10.27875464 1
 
0.1%
Other values (990) 990
99.0%
ValueCountFrequency (%)
-31.09907622 1
0.1%
-30.34507874 1
0.1%
-29.10103863 1
0.1%
-28.74414316 1
0.1%
-28.02887003 1
0.1%
-25.93349585 1
0.1%
-25.90821945 1
0.1%
-25.61192808 1
0.1%
-25.58897083 1
0.1%
-25.03581109 1
0.1%
ValueCountFrequency (%)
21.56793583 1
0.1%
20.17426201 1
0.1%
19.88434253 1
0.1%
17.93220262 1
0.1%
17.77268027 1
0.1%
17.3905916 1
0.1%
14.17918456 1
0.1%
14.01412077 1
0.1%
13.32045114 1
0.1%
13.04330909 1
0.1%

feat.g
Categorical

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
z
341 
x
330 
y
329 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowz
2nd rowx
3rd rowy
4th rowy
5th rowz

Common Values

ValueCountFrequency (%)
z 341
34.1%
x 330
33.0%
y 329
32.9%

Length

2022-11-23T15:34:48.256011image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-11-23T15:34:48.469779image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
z 341
34.1%
x 330
33.0%
y 329
32.9%

Most occurring characters

ValueCountFrequency (%)
z 341
34.1%
x 330
33.0%
y 329
32.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1000
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
z 341
34.1%
x 330
33.0%
y 329
32.9%

Most occurring scripts

ValueCountFrequency (%)
Latin 1000
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
z 341
34.1%
x 330
33.0%
y 329
32.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
z 341
34.1%
x 330
33.0%
y 329
32.9%

feat.h
Real number (ℝ)

Distinct1000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.030833
Minimum3.4212483
Maximum17.431441
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.9 KiB
2022-11-23T15:34:48.677290image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum3.4212483
5-th percentile6.6924294
Q18.7001442
median10.028309
Q311.528759
95-th percentile13.25505
Maximum17.431441
Range14.010193
Interquartile range (IQR)2.8286147

Descriptive statistics

Standard deviation2.0222002
Coefficient of variation (CV)0.20159843
Kurtosis0.039840298
Mean10.030833
Median Absolute Deviation (MAD)1.4211137
Skewness-0.13026894
Sum10030.833
Variance4.0892935
MonotonicityNot monotonic
2022-11-23T15:34:48.941342image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10.25419887 1
 
0.1%
6.754303427 1
 
0.1%
11.24278619 1
 
0.1%
12.68298848 1
 
0.1%
9.352376556 1
 
0.1%
10.28876694 1
 
0.1%
7.836294432 1
 
0.1%
9.988045316 1
 
0.1%
9.277988632 1
 
0.1%
8.493310924 1
 
0.1%
Other values (990) 990
99.0%
ValueCountFrequency (%)
3.421248303 1
0.1%
3.475701331 1
0.1%
3.945908562 1
0.1%
4.151417736 1
0.1%
4.22770175 1
0.1%
4.51677224 1
0.1%
4.614397502 1
0.1%
4.86227061 1
0.1%
5.022192772 1
0.1%
5.217552022 1
0.1%
ValueCountFrequency (%)
17.43144145 1
0.1%
16.16747908 1
0.1%
15.85780148 1
0.1%
15.69748807 1
0.1%
14.83414122 1
0.1%
14.79040265 1
0.1%
14.62396292 1
0.1%
14.62363889 1
0.1%
14.48832726 1
0.1%
14.47632645 1
0.1%

feat.i
Real number (ℝ)

HIGH CORRELATION
UNIQUE

Distinct1000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.5186067
Minimum-6.7634268
Maximum5.3157286
Zeros0
Zeros (%)0.0%
Negative598
Negative (%)59.8%
Memory size7.9 KiB
2022-11-23T15:34:49.234989image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum-6.7634268
5-th percentile-3.8868159
Q1-1.773089
median-0.50608495
Q30.80304068
95-th percentile2.7870572
Maximum5.3157286
Range12.079155
Interquartile range (IQR)2.5761297

Descriptive statistics

Standard deviation1.9843781
Coefficient of variation (CV)-3.8263643
Kurtosis-0.029327021
Mean-0.5186067
Median Absolute Deviation (MAD)1.2926208
Skewness-0.071304314
Sum-518.6067
Variance3.9377566
MonotonicityNot monotonic
2022-11-23T15:34:49.509921image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-0.8280728697 1
 
0.1%
-0.3358788662 1
 
0.1%
-2.138754969 1
 
0.1%
0.7567881697 1
 
0.1%
-2.505539361 1
 
0.1%
-3.490892139 1
 
0.1%
0.07054932741 1
 
0.1%
0.3639592021 1
 
0.1%
1.607569457 1
 
0.1%
0.9758482567 1
 
0.1%
Other values (990) 990
99.0%
ValueCountFrequency (%)
-6.763426764 1
0.1%
-6.398746196 1
0.1%
-6.376277409 1
0.1%
-6.192309923 1
0.1%
-5.845633414 1
0.1%
-5.82995219 1
0.1%
-5.756198292 1
0.1%
-5.522454828 1
0.1%
-5.215154372 1
0.1%
-5.174046974 1
0.1%
ValueCountFrequency (%)
5.315728559 1
0.1%
4.842965736 1
0.1%
4.715903484 1
0.1%
4.646257329 1
0.1%
4.588374531 1
0.1%
4.550044682 1
0.1%
4.446508874 1
0.1%
4.248004011 1
0.1%
3.966186668 1
0.1%
3.947004253 1
0.1%

Interactions

2022-11-23T15:34:42.297496image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:34.903189image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:36.581391image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:38.131996image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:39.481904image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:40.831638image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:42.534220image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:35.344473image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:36.803121image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:38.363331image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:39.699834image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:41.094795image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:42.757363image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:35.631341image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:37.033798image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:38.589644image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:39.922401image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:41.319152image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:42.983942image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:35.865667image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:37.240832image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:38.811006image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:40.147034image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:41.569727image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:43.215670image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:36.105401image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:37.443625image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:39.049193image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:40.388008image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:41.804910image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:43.437790image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:36.350929image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:37.673547image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:39.276707image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:40.623990image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-23T15:34:42.071674image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Correlations

2022-11-23T15:34:49.725067image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Auto

The auto setting is an interpretable pairwise column metric of the following mapping:
  • Variable_type-Variable_type : Method, Range
  • Categorical-Categorical : Cramer's V, [0,1]
  • Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
  • Numerical-Numerical : Spearman's ρ, [-1,1]
The number of bins used in the discretization for the Numerical-Categorical column pair can be changed using config.correlations["auto"].n_bins. The number of bins affects the granularity of the association you wish to measure.

This configuration uses the recommended metric for each pair of columns.
2022-11-23T15:34:50.018337image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-11-23T15:34:50.286621image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-11-23T15:34:50.563072image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-11-23T15:34:50.837717image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-11-23T15:34:51.082142image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-11-23T15:34:43.774982image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-11-23T15:34:44.149831image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-11-23T15:34:44.414975image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

responsefeat.afeat.bfeat.cfeat.dfeat.efeat.ffeat.gfeat.hfeat.i
01-0.681427-5.493698b0.0-0.800615-4.427602z10.254199-0.828073
110.309468-5.559933d1.0-1.155514-0.799094x9.084749-1.109698
215.676125-4.026970b1.0-3.396331-0.631966y8.753848-3.417417
311.211525-4.198263a1.0-1.894569-16.273262y12.191295-1.904801
411.387863-7.824014c1.04.696980-22.208877z9.6266864.715903
516.145195-2.439140c0.0-0.57483011.642609y12.362962-0.521423
612.382749-3.625411a0.01.326984-4.148881z9.2261221.287618
71-2.795184-0.375689c1.0-0.869053-2.994862x7.973038-0.839326
80-1.060559-2.972203b0.00.719649-15.543748z12.8931240.718503
91-0.336986-4.670439b1.0-0.6054543.060399y9.803020-0.548610
responsefeat.afeat.bfeat.cfeat.dfeat.efeat.ffeat.gfeat.hfeat.i
99013.027287-5.645709b1.0-4.847993-8.070246x12.043355-4.894286
9911-2.222620-2.611733a0.00.735233-2.876741x10.0907260.816545
99202.363733-3.629801b0.0-5.109591-7.578162y9.301541-5.119936
99300.360079-5.105157d0.0-1.393937-21.575596y10.537327-1.397865
99401.939686-5.920013b0.00.098981-17.421105z11.7055790.084855
99500.730074-3.885035b0.0-3.356949-12.803344z11.204110-3.396673
99614.211548-3.617253a0.02.0349956.995753z9.2080892.069752
9971-3.053301-3.583830c1.01.929012-7.013105z7.6378621.856356
9981-0.567850-3.194716c1.0-1.8497124.204816z11.725868-1.862466
99910.252428-4.690728d1.01.742044-4.564031y7.9097091.747037